Chi-Square (X2) Test for Independence

Chi-square Test for Independence is a statistical test commonly used to determine if there is a significant association between two variables.  For example, a biologist might want to determine if two species of organisms associate (are found together) in a community.

Picture

Does Species A associate with Species B?

Picture

Species A

Picture

Species B

Just like other statistical tests, the Chi-Square Test for Independence tests two hypotheses:

Null Hypothesis:

"There is not a significant association between variables, the variables are independent of each other; any association between variables is likely due to chance and sampling error."

For example:  

  • There is no significant association between Species A and Species B; the species are independent of each other.  The location of Species A has no effect on the location of Species B.

Alternative Hypothesis:

"There is a significant (positive or negative) association between variables; the association between variables is likely not due to chance or sampling error."

For example:  

  • There is a significant association between Species A and Species B; the species are dependent.  Either Species A significantly associates with Species B or Species A does not significantly associate with Species B.

How to Calculate a Chi-Square Test of Independence

The first step is to collect raw data for the occurrence of each variable.  This is often done via random sampling using a quadrant.  In our example, there are five quadrants.  Determine:

Picture

Then create a "contingency table" to display your results.  In the Chi-Square test, these are your OBSERVED values.

Picture

Next you need to determine what would be EXPECTED assuming the species are randomly distributed with respect to each other.   Expected frequencies = (row total X column total) / grand total

Picture

Picture

Now that you have OBSERVED and EXPECTED values, apply the Chi-Square formula in each part of the contingency table by determining (O-E)2 / E for each box.

Picture

Picture

Picture

The final calculated chi-square value is determined by summing the values:


The calculated X2 value is than compared to the “critical value  X2” found in an X2 distribution table.  The X2 distribution table represents a theoretical curve of  expected results. The expected results are based on DEGREES OF  FREEDOM.

Degrees of Freedom = (number of rows - 1) X (number of columns - 1)

In our example, DF = (2-1) X (2-1) = 1 X 1 = 1

*the row and column for the total in the contingency table are not included

 

The X2 distribution table is organized by the Level of  Significance.  The level of significance is the maximum tolerable probability of accepting a false null hypothesis.  We use 0.05.  ​

Picture

For example, with a DF=1, a value greater than 3.841 is required to be considered statistically significant (at p = 0.05). Since the X2 we calculated (0.4) is less than 3.841, there is NOT a significant association between Species A and Species B.  The location of Species A has no significant effect on the location of Species B, any association between species is likely due to chance and sampling error.